Formemes in English-Czech Deep Syntactic MT
نویسندگان
چکیده
One of the most notable recent improvements of the TectoMT English-to-Czech translation is a systematic and theoretically supported revision of formemes—the annotation of morpho-syntactic features of content words in deep dependency syntactic structures based on the Prague tectogrammatics theory. Our modifications aim at reducing data sparsity, increasing consistency across languages and widening the usage area of this markup. Formemes can be used not only in MT, but in various other NLP tasks.
منابع مشابه
Towards English-to-Czech MT via Tectogrammatical Layer
We present an overview of an English-to-Czech machine translation system. The system relies on transfer at the tectogrammatical (deep syntactic) layer of the language description. We report on the progress of linguistic annotation of English tectogrammatical layer and also on first end-to-end evaluation of our syntax-based MT system.
متن کاملTranslation of "It" in a Deep Syntax Framework
We present a novel approach to the translation of the English personal pronoun it to Czech. We conduct a linguistic analysis on how the distinct categories of it are usually mapped to their Czech counterparts. Armed with these observations, we design a discriminative translation model of it, which is then integrated into the TectoMT deep syntax MT framework. Features in the model take advantage...
متن کاملTackling Sparse Data Issue in Machine Translation Evaluation
We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
متن کاملAn MT System Recycled
This paper describes an attempt to recycle parts of the Czech-to-Russian machine translation system (MT) in the new Czech-to-English MT system. The paper describes the overall architecture of the new system and the details of the modules which have been added. A special attention is paid to the problem of named entity recognition and to the method of automatic acquisition of lexico-syntactic in...
متن کاملTargeted Paraphrasing on Deep Syntactic Layer for MT Evaluation
In this paper, we present a method of improving quality of machine translation (MT) evaluation of Czech sentences via targeted paraphrasing of reference sentences on a deep syntactic layer. For this purpose, we employ NLP framework Treex and extend it with modules for targeted paraphrasing and word order changes. Automatic scores computed using these paraphrased reference sentences show higher ...
متن کامل